Objective: Determine trends in activity patterns with respect to both weather and routine to predict and inform/improve future behavior.
Install packages.
devtools::install_github("avsecz/fitbitr")
Downloading GitHub repo avsecz/fitbitr@master
from URL https://api.github.com/repos/avsecz/fitbitr/zipball/master
Installing fitbitr
Downloading GitHub repo peterhartman/httr@peterhartman-oauth2-clientcredentials
from URL https://api.github.com/repos/peterhartman/httr/zipball/peterhartman-oauth2-clientcredentials
Installing httr
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T/RtmpOV1BHK/devtools14b997986ef/peterhartman-httr-a59909d' \
--library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library' --install-tests
* installing *source* package ‘httr’ ...
** R
** demo
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (httr)
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T/RtmpOV1BHK/devtools14b998847e0/Avsecz-fitbitr-4639697' \
--library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library' --install-tests
* installing *source* package ‘fitbitr’ ...
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (fitbitr)
devtools::install_github("r-lib/httr#485")
Downloading GitHub repo peterhartman/httr@peterhartman-oauth2-clientcredentials
from URL https://api.github.com/repos/peterhartman/httr/zipball/peterhartman-oauth2-clientcredentials
Installing httr
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T/RtmpOV1BHK/devtools14b94aadb254/peterhartman-httr-a59909d' \
--library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library' --install-tests
* installing *source* package ‘httr’ ...
** R
** demo
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (httr)
install.packages("jsonlite")
Error in install.packages : Updating loaded packages
install.packages("mongolite")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/mongolite_1.2.tgz'
Content type 'application/x-gzip' length 1779767 bytes (1.7 MB)
==================================================
downloaded 1.7 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//RtmpOV1BHK/downloaded_packages
install.packages("lubridate")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/lubridate_1.7.1.tgz'
Content type 'application/x-gzip' length 1175409 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//RtmpOV1BHK/downloaded_packages
install.packages("RCurl")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/RCurl_1.95-4.8.tgz'
Content type 'application/x-gzip' length 892550 bytes (871 KB)
==================================================
downloaded 871 KB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//RtmpOV1BHK/downloaded_packages
install.packages("jsonlite")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/jsonlite_1.5.tgz'
Content type 'application/x-gzip' length 1114207 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//RtmpOV1BHK/downloaded_packages
install.packages("XML")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/XML_3.98-1.9.tgz'
Content type 'application/x-gzip' length 1923850 bytes (1.8 MB)
==================================================
downloaded 1.8 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//RtmpOV1BHK/downloaded_packages
install.packages("ggplot2")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/ggplot2_2.2.1.tgz'
Content type 'application/x-gzip' length 2792414 bytes (2.7 MB)
==================================================
downloaded 2.7 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//RtmpOV1BHK/downloaded_packages
install.packages("Metrics")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/Metrics_0.1.3.tgz'
Content type 'application/x-gzip' length 63765 bytes (62 KB)
==================================================
downloaded 62 KB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//RtmpOV1BHK/downloaded_packages
install.packages("TTR")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/TTR_0.23-2.tgz'
Content type 'application/x-gzip' length 438711 bytes (428 KB)
==================================================
downloaded 428 KB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//RtmpOV1BHK/downloaded_packages
Load packages.
library("fitbitr")
library("httr")
library("jsonlite")
library("mongolite")
library("lubridate")
package ‘lubridate’ was built under R version 3.4.2
Attaching package: ‘lubridate’
The following object is masked from ‘package:base’:
date
library("RCurl")
Loading required package: bitops
library("XML")
library("ggplot2")
library("Metrics")
package ‘Metrics’ was built under R version 3.4.2
library("TTR")
token <- get_fitbit_token()
Use a local file ('.httr-oauth'), to cache OAuth access credentials between R sessions?
1: Yes
2: No
2
Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Authentication complete.
Pull raw activity data (JSON) from Fitbit API from 2016-06-23 (when I first began wearing my fitbit) to present (2017-12-02).
req <- fitbit_GET("1/user/4QXD3G/activities/steps/date/2016-06-23/2017-12-02.json", token = token)
output <- toJSON(fitbit_parse(req))
output
{"activities-steps":[{"dateTime":["2016-06-23"],"value":["5123"]},{"dateTime":["2016-06-24"],"value":["7219"]},{"dateTime":["2016-06-25"],"value":["5713"]},{"dateTime":["2016-06-26"],"value":["22149"]},{"dateTime":["2016-06-27"],"value":["16140"]},{"dateTime":["2016-06-28"],"value":["7348"]},{"dateTime":["2016-06-29"],"value":["5170"]},{"dateTime":["2016-06-30"],"value":["7440"]},{"dateTime":["2016-07-01"],"value":["3267"]},{"dateTime":["2016-07-02"],"value":["4687"]},{"dateTime":["2016-07-03"],"value":["17418"]},{"dateTime":["2016-07-04"],"value":["6517"]},{"dateTime":["2016-07-05"],"value":["7512"]},{"dateTime":["2016-07-06"],"value":["12951"]},{"dateTime":["2016-07-07"],"value":["14382"]},{"dateTime":["2016-07-08"],"value":["10925"]},{"dateTime":["2016-07-09"],"value":["8606"]},{"dateTime":["2016-07-10"],"value":["1523"]},{"dateTime":["2016-07-11"],"value":["10106"]},{"dateTime":["2016-07-12"],"value":["8176"]},{"dateTime":["2016-07-13"],"value":["9805"]},{"dateTime":["2016-07-14"],"value":["9134"]},{"dateTime":["2016-07-15"],"value":["8771"]},{"dateTime":["2016-07-16"],"value":["4685"]},{"dateTime":["2016-07-17"],"value":["15721"]},{"dateTime":["2016-07-18"],"value":["7536"]},{"dateTime":["2016-07-19"],"value":["5098"]},{"dateTime":["2016-07-20"],"value":["10481"]},{"dateTime":["2016-07-21"],"value":["9565"]},{"dateTime":["2016-07-22"],"value":["8790"]},{"dateTime":["2016-07-23"],"value":["8589"]},{"dateTime":["2016-07-24"],"value":["12903"]},{"dateTime":["2016-07-25"],"value":["5521"]},{"dateTime":["2016-07-26"],"value":["5657"]},{"dateTime":["2016-07-27"],"value":["9281"]},{"dateTime":["2016-07-28"],"value":["6921"]},{"dateTime":["2016-07-29"],"value":["11778"]},{"dateTime":["2016-07-30"],"value":["9543"]},{"dateTime":["2016-07-31"],"value":["1426"]},{"dateTime":["2016-08-01"],"value":["4998"]},{"dateTime":["2016-08-02"],"value":["8208"]},{"dateTime":["2016-08-03"],"value":["4774"]},{"dateTime":["2016-08-04"],"value":["10350"]},{"dateTime":["2016-08-05"],"value":["6341"]},{"dateTime":["2016-08-06"],"value":["12787"]},{"dateTime":["2016-08-07"],"value":["6120"]},{"dateTime":["2016-08-08"],"value":["8682"]},{"dateTime":["2016-08-09"],"value":["7093"]},{"dateTime":["2016-08-10"],"value":["9225"]},{"dateTime":["2016-08-11"],"value":["9347"]},{"dateTime":["2016-08-12"],"value":["6664"]},{"dateTime":["2016-08-13"],"value":["4115"]},{"dateTime":["2016-08-14"],"value":["18061"]},{"dateTime":["2016-08-15"],"value":["5108"]},{"dateTime":["2016-08-16"],"value":["10738"]},{"dateTime":["2016-08-17"],"value":["13409"]},{"dateTime":["2016-08-18"],"value":["5212"]},{"dateTime":["2016-08-19"],"value":["3768"]},{"dateTime":["2016-08-20"],"value":["4639"]},{"dateTime":["2016-08-21"],"value":["4129"]},{"dateTime":["2016-08-22"],"value":["4738"]},{"dateTime":["2016-08-23"],"value":["4413"]},{"dateTime":["2016-08-24"],"value":["3405"]},{"dateTime":["2016-08-25"],"value":["3006"]},{"dateTime":["2016-08-26"],"value":["4294"]},{"dateTime":["2016-08-27"],"value":["5154"]},{"dateTime":["2016-08-28"],"value":["5553"]},{"dateTime":["2016-08-29"],"value":["3445"]},{"dateTime":["2016-08-30"],"value":["7371"]},{"dateTime":["2016-08-31"],"value":["3914"]},{"dateTime":["2016-09-01"],"value":["4140"]},{"dateTime":["2016-09-02"],"value":["8066"]},{"dateTime":["2016-09-03"],"value":["10328"]},{"dateTime":["2016-09-04"],"value":["16223"]},{"dateTime":["2016-09-05"],"value":["10429"]},{"dateTime":["2016-09-06"],"value":["11738"]},{"dateTime":["2016-09-07"],"value":["7731"]},{"dateTime":["2016-09-08"],"value":["8752"]},{"dateTime":["2016-09-09"],"value":["11706"]},{"dateTime":["2016-09-10"],"value":["12265"]},{"dateTime":["2016-09-11"],"value":["11252"]},{"dateTime":["2016-09-12"],"value":["6379"]},{"dateTime":["2016-09-13"],"value":["12965"]},{"dateTime":["2016-09-14"],"value":["6851"]},{"dateTime":["2016-09-15"],"value":["7824"]},{"dateTime":["2016-09-16"],"value":["7586"]},{"dateTime":["2016-09-17"],"value":["11544"]},{"dateTime":["2016-09-18"],"value":["7880"]},{"dateTime":["2016-09-19"],"value":["8834"]},{"dateTime":["2016-09-20"],"value":["6338"]},{"dateTime":["2016-09-21"],"value":["6306"]},{"dateTime":["2016-09-22"],"value":["10333"]},{"dateTime":["2016-09-23"],"value":["4102"]},{"dateTime":["2016-09-24"],"value":["1835"]},{"dateTime":["2016-09-25"],"value":["6054"]},{"dateTime":["2016-09-26"],"value":["6148"]},{"dateTime":["2016-09-27"],"value":["8178"]},{"dateTime":["2016-09-28"],"value":["7001"]},{"dateTime":["2016-09-29"],"value":["8159"]},{"dateTime":["2016-09-30"],"value":["13057"]},{"dateTime":["2016-10-01"],"value":["8199"]},{"dateTime":["2016-10-02"],"value":["11007"]},{"dateTime":["2016-10-03"],"value":["7693"]},{"dateTime":["2016-10-04"],"value":["6815"]},{"dateTime":["2016-10-05"],"value":["6425"]},{"dateTime":["2016-10-06"],"value":["11990"]},{"dateTime":["2016-10-07"],"value":["5272"]},{"dateTime":["2016-10-08"],"value":["8174"]},{"dateTime":["2016-10-09"],"value":["11342"]},{"dateTime":["2016-10-10"],"value":["5454"]},{"dateTime":["2016-10-11"],"value":["8223"]},{"dateTime":["2016-10-12"],"value":["5827"]},{"dateTime":["2016-10-13"],"value":["11307"]},{"dateTime":["2016-10-14"],"value":["6421"]},{"dateTime":["2016-10-15"],"value":["7849"]},{"dateTime":["2016-10-16"],"value":["7691"]},{"dateTime":["2016-10-17"],"value":["6455"]},{"dateTime":["2016-10-18"],"value":["10827"]},{"dateTime":["2016-10-19"],"value":["6954"]},{"dateTime":["2016-10-20"],"value":["4057"]},{"dateTime":["2016-10-21"],"value":["11358"]},{"dateTime":["2016-10-22"],"value":["8697"]},{"dateTime":["2016-10-23"],"value":["7839"]},{"dateTime":["2016-10-24"],"value":["12372"]},{"dateTime":["2016-10-25"],"value":["8168"]},{"dateTime":["2016-10-26"],"value":["8029"]},{"dateTime":["2016-10-27"],"value":["10339"]},{"dateTime":["2016-10-28"],"value":["7222"]},{"dateTime":["2016-10-29"],"value":["5572"]},{"dateTime":["2016-10-30"],"value":["4644"]},{"dateTime":["2016-10-31"],"value":["7990"]},{"dateTime":["2016-11-01"],"value":["7481"]},{"dateTime":["2016-11-02"],"value":["6943"]},{"dateTime":["2016-11-03"],"value":["10938"]},{"dateTime":["2016-11-04"],"value":["8844"]},{"dateTime":["2016-11-05"],"value":["12943"]},{"dateTime":["2016-11-06"],"value":["9069"]},{"dateTime":["2016-11-07"],"value":["7407"]},{"dateTime":["2016-11-08"],"value":["8478"]},{"dateTime":["2016-11-09"],"value":["5400"]},{"dateTime":["2016-11-10"],"value":["17227"]},{"dateTime":["2016-11-11"],"value":["8813"]},{"dateTime":["2016-11-12"],"value":["9101"]},{"dateTime":["2016-11-13"],"value":["8537"]},{"dateTime":["2016-11-14"],"value":["6023"]},{"dateTime":["2016-11-15"],"value":["14019"]},{"dateTime":["2016-11-16"],"value":["7357"]},{"dateTime":["2016-11-17"],"value":["11868"]},{"dateTime":["2016-11-18"],"value":["8810"]},{"dateTime":["2016-11-19"],"value":["14173"]},{"dateTime":["2016-11-20"],"value":["5052"]},{"dateTime":["2016-11-21"],"value":["7639"]},{"dateTime":["2016-11-22"],"value":["9078"]},{"dateTime":["2016-11-23"],"value":["3161"]},{"dateTime":["2016-11-24"],"value":["7595"]},{"dateTime":["2016-11-25"],"value":["6066"]},{"dateTime":["2016-11-26"],"value":["4776"]},{"dateTime":["2016-11-27"],"value":["1540"]},{"dateTime":["2016-11-28"],"value":["8878"]},{"dateTime":["2016-11-29"],"value":["6079"]},{"dateTime":["2016-11-30"],"value":["8596"]},{"dateTime":["2016-12-01"],"value":["8407"]},{"dateTime":["2016-12-02"],"value":["16140"]},{"dateTime":["2016-12-03"],"value":["11953"]},{"dateTime":["2016-12-04"],"value":["12464"]},{"dateTime":["2016-12-05"],"value":["5864"]},{"dateTime":["2016-12-06"],"value":["5979"]},{"dateTime":["2016-12-07"],"value":["6853"]},{"dateTime":["2016-12-08"],"value":["7056"]},{"dateTime":["2016-12-09"],"value":["8474"]},{"dateTime":["2016-12-10"],"value":["9692"]},{"dateTime":["2016-12-11"],"value":["7544"]},{"dateTime":["2016-12-12"],"value":["6839"]},{"dateTime":["2016-12-13"],"value":["7413"]},{"dateTime":["2016-12-14"],"value":["13262"]},{"dateTime":["2016-12-15"],"value":["10220"]},{"dateTime":["2016-12-16"],"value":["10612"]},{"dateTime":["2016-12-17"],"value":["4551"]},{"dateTime":["2016-12-18"],"value":["10123"]},{"dateTime":["2016-12-19"],"value":["6285"]},{"dateTime":["2016-12-20"],"value":["1336"]},{"dateTime":["2016-12-21"],"value":["3145"]},{"dateTime":["2016-12-22"],"value":["2549"]},{"dateTime":["2016-12-23"],"value":["7622"]},{"dateTime":["2016-12-24"],"value":["10751"]},{"dateTime":["2016-12-25"],"value":["1641"]},{"dateTime":["2016-12-26"],"value":["7399"]},{"dateTime":["2016-12-27"],"value":["3321"]},{"dateTime":["2016-12-28"],"value":["9283"]},{"dateTime":["2016-12-29"],"value":["3468"]},{"dateTime":["2016-12-30"],"value":["10771"]},{"dateTime":["2016-12-31"],"value":["4273"]},{"dateTime":["2017-01-01"],"value":["2575"]},{"dateTime":["2017-01-02"],"value":["1455"]},{"dateTime":["2017-01-03"],"value":["10934"]},{"dateTime":["2017-01-04"],"value":["12008"]},{"dateTime":["2017-01-05"],"value":["11525"]},{"dateTime":["2017-01-06"],"value":["12212"]},{"dateTime":["2017-01-07"],"value":["13900"]},{"dateTime":["2017-01-08"],"value":["10520"]},{"dateTime":["2017-01-09"],"value":["10527"]},{"dateTime":["2017-01-10"],"value":["9006"]},{"dateTime":["2017-01-11"],"value":["12439"]},{"dateTime":["2017-01-12"],"value":["10494"]},{"dateTime":["2017-01-13"],"value":["9628"]},{"dateTime":["2017-01-14"],"value":["9862"]},{"dateTime":["2017-01-15"],"value":["12196"]},{"dateTime":["2017-01-16"],"value":["7420"]},{"dateTime":["2017-01-17"],"value":["6460"]},{"dateTime":["2017-01-18"],"value":["7424"]},{"dateTime":["2017-01-19"],"value":["7908"]},{"dateTime":["2017-01-20"],"value":["10468"]},{"dateTime":["2017-01-21"],"value":["18868"]},{"dateTime":["2017-01-22"],"value":["11017"]},{"dateTime":["2017-01-23"],"value":["8137"]},{"dateTime":["2017-01-24"],"value":["6751"]},{"dateTime":["2017-01-25"],"value":["8905"]},{"dateTime":["2017-01-26"],"value":["9114"]},{"dateTime":["2017-01-27"],"value":["11099"]},{"dateTime":["2017-01-28"],"value":["8136"]},{"dateTime":["2017-01-29"],"value":["8533"]},{"dateTime":["2017-01-30"],"value":["13123"]},{"dateTime":["2017-01-31"],"value":["8281"]},{"dateTime":["2017-02-01"],"value":["7668"]},{"dateTime":["2017-02-02"],"value":["13521"]},{"dateTime":["2017-02-03"],"value":["10753"]},{"dateTime":["2017-02-04"],"value":["11272"]},{"dateTime":["2017-02-05"],"value":["16275"]},{"dateTime":["2017-02-06"],"value":["13354"]},{"dateTime":["2017-02-07"],"value":["9639"]},{"dateTime":["2017-02-08"],"value":["12706"]},{"dateTime":["2017-02-09"],"value":["5836"]},{"dateTime":["2017-02-10"],"value":["12959"]},{"dateTime":["2017-02-11"],"value":["14026"]},{"dateTime":["2017-02-12"],"value":["10943"]},{"dateTime":["2017-02-13"],"value":["5641"]},{"dateTime":["2017-02-14"],"value":["12746"]},{"dateTime":["2017-02-15"],"value":["8478"]},{"dateTime":["2017-02-16"],"value":["13337"]},{"dateTime":["2017-02-17"],"value":["8135"]},{"dateTime":["2017-02-18"],"value":["8199"]},{"dateTime":["2017-02-19"],"value":["7705"]},{"dateTime":["2017-02-20"],"value":["11438"]},{"dateTime":["2017-02-21"],"value":["10113"]},{"dateTime":["2017-02-22"],"value":["12511"]},{"dateTime":["2017-02-23"],"value":["11417"]},{"dateTime":["2017-02-24"],"value":["17221"]},{"dateTime":["2017-02-25"],"value":["17397"]},{"dateTime":["2017-02-26"],"value":["8499"]},{"dateTime":["2017-02-27"],"value":["6604"]},{"dateTime":["2017-02-28"],"value":["13288"]},{"dateTime":["2017-03-01"],"value":["10754"]},{"dateTime":["2017-03-02"],"value":["9998"]},{"dateTime":["2017-03-03"],"value":["16251"]},{"dateTime":["2017-03-04"],"value":["9966"]},{"dateTime":["2017-03-05"],"value":["6842"]},{"dateTime":["2017-03-06"],"value":["10622"]},{"dateTime":["2017-03-07"],"value":["8163"]},{"dateTime":["2017-03-08"],"value":["9175"]},{"dateTime":["2017-03-09"],"value":["8116"]},{"dateTime":["2017-03-10"],"value":["8295"]},{"dateTime":["2017-03-11"],"value":["13506"]},{"dateTime":["2017-03-12"],"value":["5338"]},{"dateTime":["2017-03-13"],"value":["7388"]},{"dateTime":["2017-03-14"],"value":["3010"]},{"dateTime":["2017-03-15"],"value":["13614"]},{"dateTime":["2017-03-16"],"value":["12667"]},{"dateTime":["2017-03-17"],"value":["10642"]},{"dateTime":["2017-03-18"],"value":["14506"]},{"dateTime":["2017-03-19"],"value":["21348"]},{"dateTime":["2017-03-20"],"value":["8956"]},{"dateTime":["2017-03-21"],"value":["9163"]},{"dateTime":["2017-03-22"],"value":["10200"]},{"dateTime":["2017-03-23"],"value":["13726"]},{"dateTime":["2017-03-24"],"value":["12413"]},{"dateTime":["2017-03-25"],"value":["5005"]},{"dateTime":["2017-03-26"],"value":["10820"]},{"dateTime":["2017-03-27"],"value":["7792"]},{"dateTime":["2017-03-28"],"value":["14919"]},{"dateTime":["2017-03-29"],"value":["10161"]},{"dateTime":["2017-03-30"],"value":["9477"]},{"dateTime":["2017-03-31"],"value":["9297"]},{"dateTime":["2017-04-01"],"value":["12164"]},{"dateTime":["2017-04-02"],"value":["11866"]},{"dateTime":["2017-04-03"],"value":["12319"]},{"dateTime":["2017-04-04"],"value":["9674"]},{"dateTime":["2017-04-05"],"value":["7622"]},{"dateTime":["2017-04-06"],"value":["7768"]},{"dateTime":["2017-04-07"],"value":["10748"]},{"dateTime":["2017-04-08"],"value":["8606"]},{"dateTime":["2017-04-09"],"value":["14582"]},{"dateTime":["2017-04-10"],"value":["14674"]},{"dateTime":["2017-04-11"],"value":["12895"]},{"dateTime":["2017-04-12"],"value":["7289"]},{"dateTime":["2017-04-13"],"value":["10372"]},{"dateTime":["2017-04-14"],"value":["21453"]},{"dateTime":["2017-04-15"],"value":["13985"]},{"dateTime":["2017-04-16"],"value":["21727"]},{"dateTime":["2017-04-17"],"value":["15825"]},{"dateTime":["2017-04-18"],"value":["11336"]},{"dateTime":["2017-04-19"],"value":["14770"]},{"dateTime":["2017-04-20"],"value":["9249"]},{"dateTime":["2017-04-21"],"value":["10833"]},{"dateTime":["2017-04-22"],"value":["15734"]},{"dateTime":["2017-04-23"],"value":["16762"]},{"dateTime":["2017-04-24"],"value":["6605"]},{"dateTime":["2017-04-25"],"value":["8402"]},{"dateTime":["2017-04-26"],"value":["11657"]},{"dateTime":["2017-04-27"],"value":["12322"]},{"dateTime":["2017-04-28"],"value":["8395"]},{"dateTime":["2017-04-29"],"value":["28112"]},{"dateTime":["2017-04-30"],"value":["7888"]},{"dateTime":["2017-05-01"],"value":["9576"]},{"dateTime":["2017-05-02"],"value":["16799"]},{"dateTime":["2017-05-03"],"value":["16745"]},{"dateTime":["2017-05-04"],"value":["15594"]},{"dateTime":["2017-05-05"],"value":["7181"]},{"dateTime":["2017-05-06"],"value":["18063"]},{"dateTime":["2017-05-07"],"value":["13033"]},{"dateTime":["2017-05-08"],"value":["18707"]},{"dateTime":["2017-05-09"],"value":["11731"]},{"dateTime":["2017-05-10"],"value":["7412"]},{"dateTime":["2017-05-11"],"value":["8460"]},{"dateTime":["2017-05-12"],"value":["7896"]},{"dateTime":["2017-05-13"],"value":["7229"]},{"dateTime":["2017-05-14"],"value":["6272"]},{"dateTime":["2017-05-15"],"value":["16270"]},{"dateTime":["2017-05-16"],"value":["10105"]},{"dateTime":["2017-05-17"],"value":["10869"]},{"dateTime":["2017-05-18"],"value":["13738"]},{"dateTime":["2017-05-19"],"value":["15155"]},{"dateTime":["2017-05-20"],"value":["17567"]},{"dateTime":["2017-05-21"],"value":["13051"]},{"dateTime":["2017-05-22"],"value":["14424"]},{"dateTime":["2017-05-23"],"value":["11302"]},{"dateTime":["2017-05-24"],"value":["17390"]},{"dateTime":["2017-05-25"],"value":["11431"]},{"dateTime":["2017-05-26"],"value":["11368"]},{"dateTime":["2017-05-27"],"value":["15998"]},{"dateTime":["2017-05-28"],"value":["12076"]},{"dateTime":["2017-05-29"],"value":["9984"]},{"dateTime":["2017-05-30"],"value":["11219"]},{"dateTime":["2017-05-31"],"value":["17159"]},{"dateTime":["2017-06-01"],"value":["8975"]},{"dateTime":["2017-06-02"],"value":["9346"]},{"dateTime":["2017-06-03"],"value":["17216"]},{"dateTime":["2017-06-04"],"value":["9712"]},{"dateTime":["2017-06-05"],"value":["16121"]},{"dateTime":["2017-06-06"],"value":["10942"]},{"dateTime":["2017-06-07"],"value":["19079"]},{"dateTime":["2017-06-08"],"value":["17120"]},{"dateTime":["2017-06-09"],"value":["16307"]},{"dateTime":["2017-06-10"],"value":["18517"]},{"dateTime":["2017-06-11"],"value":["6830"]},{"dateTime":["2017-06-12"],"value":["7642"]},{"dateTime":["2017-06-13"],"value":["13555"]},{"dateTime":["2017-06-14"],"value":["14251"]},{"dateTime":["2017-06-15"],"value":["11054"]},{"dateTime":["2017-06-16"],"value":["11880"]},{"dateTime":["2017-06-17"],"value":["21408"]},{"dateTime":["2017-06-18"],"value":["20301"]},{"dateTime":["2017-06-19"],"value":["11399"]},{"dateTime":["2017-06-20"],"value":["10130"]},{"dateTime":["2017-06-21"],"value":["11097"]},{"dateTime":["2017-06-22"],"value":["15011"]},{"dateTime":["2017-06-23"],"value":["12072"]},{"dateTime":["2017-06-24"],"value":["11454"]},{"dateTime":["2017-06-25"],"value":["11939"]},{"dateTime":["2017-06-26"],"value":["8631"]},{"dateTime":["2017-06-27"],"value":["7865"]},{"dateTime":["2017-06-28"],"value":["7817"]},{"dateTime":["2017-06-29"],"value":["11676"]},{"dateTime":["2017-06-30"],"value":["11732"]},{"dateTime":["2017-07-01"],"value":["5787"]},{"dateTime":["2017-07-02"],"value":["8800"]},{"dateTime":["2017-07-03"],"value":["7164"]},{"dateTime":["2017-07-04"],"value":["6378"]},{"dateTime":["2017-07-05"],"value":["7594"]},{"dateTime":["2017-07-06"],"value":["25335"]},{"dateTime":["2017-07-07"],"value":["15529"]},{"dateTime":["2017-07-08"],"value":["21520"]},{"dateTime":["2017-07-09"],"value":["8450"]},{"dateTime":["2017-07-10"],"value":["12144"]},{"dateTime":["2017-07-11"],"value":["11372"]},{"dateTime":["2017-07-12"],"value":["14312"]},{"dateTime":["2017-07-13"],"value":["11282"]},{"dateTime":["2017-07-14"],"value":["12960"]},{"dateTime":["2017-07-15"],"value":["7861"]},{"dateTime":["2017-07-16"],"value":["6553"]},{"dateTime":["2017-07-17"],"value":["12093"]},{"dateTime":["2017-07-18"],"value":["8623"]},{"dateTime":["2017-07-19"],"value":["8044"]},{"dateTime":["2017-07-20"],"value":["6782"]},{"dateTime":["2017-07-21"],"value":["4962"]},{"dateTime":["2017-07-22"],"value":["19728"]},{"dateTime":["2017-07-23"],"value":["4454"]},{"dateTime":["2017-07-24"],"value":["7055"]},{"dateTime":["2017-07-25"],"value":["5842"]},{"dateTime":["2017-07-26"],"value":["9514"]},{"dateTime":["2017-07-27"],"value":["3019"]},{"dateTime":["2017-07-28"],"value":["8719"]},{"dateTime":["2017-07-29"],"value":["11351"]},{"dateTime":["2017-07-30"],"value":["13570"]},{"dateTime":["2017-07-31"],"value":["28755"]},{"dateTime":["2017-08-01"],"value":["24663"]},{"dateTime":["2017-08-02"],"value":["12452"]},{"dateTime":["2017-08-03"],"value":["25750"]},{"dateTime":["2017-08-04"],"value":["23785"]},{"dateTime":["2017-08-05"],"value":["16673"]},{"dateTime":["2017-08-06"],"value":["21089"]},{"dateTime":["2017-08-07"],"value":["15593"]},{"dateTime":["2017-08-08"],"value":["7027"]},{"dateTime":["2017-08-09"],"value":["13182"]},{"dateTime":["2017-08-10"],"value":["7444"]},{"dateTime":["2017-08-11"],"value":["6813"]},{"dateTime":["2017-08-12"],"value":["9394"]},{"dateTime":["2017-08-13"],"value":["3123"]},{"dateTime":["2017-08-14"],"value":["9114"]},{"dateTime":["2017-08-15"],"value":["7225"]},{"dateTime":["2017-08-16"],"value":["17293"]},{"dateTime":["2017-08-17"],"value":["11969"]},{"dateTime":["2017-08-18"],"value":["14816"]},{"dateTime":["2017-08-19"],"value":["8377"]},{"dateTime":["2017-08-20"],"value":["11256"]},{"dateTime":["2017-08-21"],"value":["13125"]},{"dateTime":["2017-08-22"],"value":["10628"]},{"dateTime":["2017-08-23"],"value":["12241"]},{"dateTime":["2017-08-24"],"value":["12643"]},{"dateTime":["2017-08-25"],"value":["11701"]},{"dateTime":["2017-08-26"],"value":["13191"]},{"dateTime":["2017-08-27"],"value":["13836"]},{"dateTime":["2017-08-28"],"value":["9303"]},{"dateTime":["2017-08-29"],"value":["17287"]},{"dateTime":["2017-08-30"],"value":["10712"]},{"dateTime":["2017-08-31"],"value":["14567"]},{"dateTime":["2017-09-01"],"value":["16157"]},{"dateTime":["2017-09-02"],"value":["21339"]},{"dateTime":["2017-09-03"],"value":["14595"]},{"dateTime":["2017-09-04"],"value":["15477"]},{"dateTime":["2017-09-05"],"value":["9365"]},{"dateTime":["2017-09-06"],"value":["11637"]},{"dateTime":["2017-09-07"],"value":["13367"]},{"dateTime":["2017-09-08"],"value":["19048"]},{"dateTime":["2017-09-09"],"value":["9156"]},{"dateTime":["2017-09-10"],"value":["12143"]},{"dateTime":["2017-09-11"],"value":["6909"]},{"dateTime":["2017-09-12"],"value":["8883"]},{"dateTime":["2017-09-13"],"value":["14180"]},{"dateTime":["2017-09-14"],"value":["12530"]},{"dateTime":["2017-09-15"],"value":["10739"]},{"dateTime":["2017-09-16"],"value":["12049"]},{"dateTime":["2017-09-17"],"value":["7032"]},{"dateTime":["2017-09-18"],"value":["11524"]},{"dateTime":["2017-09-19"],"value":["9876"]},{"dateTime":["2017-09-20"],"value":["10738"]},{"dateTime":["2017-09-21"],"value":["9784"]},{"dateTime":["2017-09-22"],"value":["11621"]},{"dateTime":["2017-09-23"],"value":["10106"]},{"dateTime":["2017-09-24"],"value":["8735"]},{"dateTime":["2017-09-25"],"value":["11667"]},{"dateTime":["2017-09-26"],"value":["10237"]},{"dateTime":["2017-09-27"],"value":["11251"]},{"dateTime":["2017-09-28"],"value":["10128"]},{"dateTime":["2017-09-29"],"value":["11240"]},{"dateTime":["2017-09-30"],"value":["15062"]},{"dateTime":["2017-10-01"],"value":["10241"]},{"dateTime":["2017-10-02"],"value":["10777"]},{"dateTime":["2017-10-03"],"value":["9454"]},{"dateTime":["2017-10-04"],"value":["7160"]},{"dateTime":["2017-10-05"],"value":["8307"]},{"dateTime":["2017-10-06"],"value":["16198"]},{"dateTime":["2017-10-07"],"value":["9364"]},{"dateTime":["2017-10-08"],"value":["10295"]},{"dateTime":["2017-10-09"],"value":["10868"]},{"dateTime":["2017-10-10"],"value":["10325"]},{"dateTime":["2017-10-11"],"value":["7557"]},{"dateTime":["2017-10-12"],"value":["7599"]},{"dateTime":["2017-10-13"],"value":["8884"]},{"dateTime":["2017-10-14"],"value":["10077"]},{"dateTime":["2017-10-15"],"value":["11142"]},{"dateTime":["2017-10-16"],"value":["6195"]},{"dateTime":["2017-10-17"],"value":["10074"]},{"dateTime":["2017-10-18"],"value":["8039"]},{"dateTime":["2017-10-19"],"value":["11582"]},{"dateTime":["2017-10-20"],"value":["13700"]},{"dateTime":["2017-10-21"],"value":["13798"]},{"dateTime":["2017-10-22"],"value":["10666"]},{"dateTime":["2017-10-23"],"value":["11882"]},{"dateTime":["2017-10-24"],"value":["7698"]},{"dateTime":["2017-10-25"],"value":["6681"]},{"dateTime":["2017-10-26"],"value":["10719"]},{"dateTime":["2017-10-27"],"value":["10901"]},{"dateTime":["2017-10-28"],"value":["10750"]},{"dateTime":["2017-10-29"],"value":["8192"]},{"dateTime":["2017-10-30"],"value":["10310"]},{"dateTime":["2017-10-31"],"value":["9052"]},{"dateTime":["2017-11-01"],"value":["8338"]},{"dateTime":["2017-11-02"],"value":["6839"]},{"dateTime":["2017-11-03"],"value":["11938"]},{"dateTime":["2017-11-04"],"value":["14951"]},{"dateTime":["2017-11-05"],"value":["10474"]},{"dateTime":["2017-11-06"],"value":["10137"]},{"dateTime":["2017-11-07"],"value":["12485"]},{"dateTime":["2017-11-08"],"value":["10036"]},{"dateTime":["2017-11-09"],"value":["6499"]},{"dateTime":["2017-11-10"],"value":["9645"]},{"dateTime":["2017-11-11"],"value":["13346"]},{"dateTime":["2017-11-12"],"value":["8107"]},{"dateTime":["2017-11-13"],"value":["8203"]},{"dateTime":["2017-11-14"],"value":["6289"]},{"dateTime":["2017-11-15"],"value":["11249"]},{"dateTime":["2017-11-16"],"value":["8449"]},{"dateTime":["2017-11-17"],"value":["11965"]},{"dateTime":["2017-11-18"],"value":["12894"]},{"dateTime":["2017-11-19"],"value":["12468"]},{"dateTime":["2017-11-20"],"value":["9118"]},{"dateTime":["2017-11-21"],"value":["11525"]},{"dateTime":["2017-11-22"],"value":["17612"]},{"dateTime":["2017-11-23"],"value":["3837"]},{"dateTime":["2017-11-24"],"value":["6107"]},{"dateTime":["2017-11-25"],"value":["3939"]},{"dateTime":["2017-11-26"],"value":["9762"]},{"dateTime":["2017-11-27"],"value":["10962"]},{"dateTime":["2017-11-28"],"value":["7373"]},{"dateTime":["2017-11-29"],"value":["10182"]},{"dateTime":["2017-11-30"],"value":["6821"]},{"dateTime":["2017-12-01"],"value":["14290"]},{"dateTime":["2017-12-02"],"value":["19143"]}]}
Convert JSON to data frame.
output <- fromJSON(output, simplifyDataFrame = TRUE)
output
$`activities-steps`
NA
Clean activity data. Unlist JSON elements into columns. Convert data types and rename value column.
data <- output$`activities-steps`
data$dateTime <- unlist(data$dateTime)
data$stepCount <- unlist(data$stepCount)
data
colnames(data)[2] <- "stepCount"
data$dateTime <- as.Date(data$dateTime, "%Y-%m-%d")
data$stepCount <- as.integer(data$stepCount)
Add additional date features for day of the week and number week of the year.
day <- weekdays(data$dateTime)
week <- week(data$dateTime)
data <- cbind(data, day, week)
data
Histogram representing distribution of daily step counts. The histogram shows a bell curve with a relatively normal distribution.
qplot(data$stepCount,
geom = "histogram",
binwidth = 500)
Create a function to scrape weather data from https://www.wunderground.com/ for a given time period. (The source is unable to provide history for very large periods of time and therefore must be retrieved in batches).
getUrl <- function (date1, date2) {
# Assemble proper url to retrieve weather data from webpage.
# Arguments: date1, date2 are strings representing the start and end date in the format 'YYYY/mm/dd'
# Returns: the customized url
URL <-
paste(
"https://www.wunderground.com/history/airport/KBOS/",
date1,
"/CustomHistory.html?dayend=",
substr(date2, 9, 10),
"&monthend=",
substr(date2, 6, 7),
"&yearend=",
substr(date2, 1, 4),
"&req_city=&req_state=&req_statename=&reqdb.zip=&reqdb.magic=&reqdb.wmo=",
sep = ""
)
return (URL)
}
fetchWeather <- function(startDate, endDate) {
# Retrieve weather data for a given date range from webpage.
# Arguments: startDate, endDate are strings representing the start and end date in the format 'YYYY/mm/dd'
# Returns: a data frame containing the date, temperature, humidity, wind, precipitation, and types of weather
webpage <- RCurl::getURL(getUrl(startDate, endDate))
tc <- textConnection(webpage)
webpage <- readLines(tc)
close(tc)
pagetree <- htmlTreeParse(webpage, useInternalNodes = TRUE)
weatherDate <-
unlist(xpathApply(pagetree, "//*[@id='obsTable']/tbody/tr/td[1]/a", xmlValue))
weatherTempHi <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[2]/span",
xmlValue
))
weatherTempLo <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[4]/span",
xmlValue
))
weatherHumidity <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[9]/span",
xmlValue
))
weatherWind <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[18]/span",
xmlValue
))
weatherPrecip <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[20]/span",
xmlValue
))
weatherType <-
unlist(xpathApply(pagetree, "//*[@id='obsTable']/tbody/tr/td[21]", xmlValue))
# clean and parse through weatherType text
weatherType <- weatherType[grepl("\n", weatherType)] # remove empty strings
weatherType <- gsub("\n", "", weatherType)
weatherType <- gsub("\t", "", weatherType)
################## REMOVE THIS weatherType
weatherRain <- grepl("Rain", weatherType)
weatherThunder <- grepl("Thunderstorm", weatherType)
weatherFog <- grepl("Fog", weatherType)
weatherSnow <- grepl("Snow", weatherType)
weatherFrame <-
data.frame(
weatherDate,
weatherTempHi,
weatherTempLo,
weatherHumidity,
weatherWind,
weatherPrecip,
weatherType,
weatherRain,
weatherThunder,
weatherFog,
weatherSnow,
stringsAsFactors = FALSE
)
weatherFrame[, 1:6] <- sapply(weatherFrame[, 1:6], as.numeric)
colnames(weatherFrame) <-
c(
"date",
"tempHi",
"tempLo",
"avgHumidity",
"avgWind",
"precip",
"type",
"rain",
"thunder",
"fog",
"snow"
)
return(weatherFrame)
}
Retrieve weather from 2016-23-17 to 2017-12-02 and combine the two batches into one weather data frame.
weather2016 <- fetchWeather("2016/06/23", "2017/06/22")
NAs introduced by coercion
weather2017 <- fetchWeather("2017/06/23", "2017/12/02")
NAs introduced by coercion
weatherTotal <- rbind(weather2016, weather2017)
weatherTotal
Merge activity data frame with weather data frame.
data <- cbind(data, weatherTotal[,2:11])
data
Checking for missing values, it is revealed that only the precipitation column contains missing values.
sapply(data[,1:14], function(x) length(x[which(is.na(x) == TRUE)]))
dateTime stepCount day week tempHi tempLo avgHumidity avgWind precip type
0 0 0 0 0 0 0 0 56 0
rain thunder fog snow
0 0 0 0
Exploratory visuals for precipitation.
# precipitation over a one year period
plot(weather2016$precip, xlab="days since 2016/06/23", ylab="precipitation (in)")
# distribution of precipitation amounts
qplot(weatherTotal$precip,
geom="histogram")
# distribution of non-zero precipitation amounts
qplot(weatherTotal$precip[which(weatherTotal$precip>0)],
geom="histogram")
plot(data$precip, data$avgHumidity)
plot(data$precip, data$avgWind)
Partition data into training and validation sets.
set.seed(200)
sampleRows <- sample.int(nrow(data), size = nrow(data)*.75)
sampleRows
[1] 282 308 311 363 350 439 372 51 273 123 236 336 80 335 197 158 291 68 471 329 234 521 105 466 161 134 361 191 16 261
[31] 132 83 240 162 458 277 88 478 242 238 287 351 337 86 461 333 77 278 314 399 33 57 425 266 245 64 71 76 264 316
[61] 69 267 354 470 360 480 101 130 44 520 163 472 213 126 181 142 265 511 290 3 366 332 292 514 445 349 204 435 164 523
[91] 208 97 14 100 452 504 133 451 193 294 429 356 49 237 82 21 269 2 211 201 379 32 72 249 205 340 365 81 190 4
[121] 255 258 128 315 404 303 207 239 460 485 442 260 270 235 224 115 244 488 98 283 202 137 438 319 307 47 369 250 199 154
[151] 384 74 413 513 188 189 341 56 347 464 157 299 11 517 78 388 417 19 371 420 328 8 63 30 175 518 10 309 279 24
[181] 129 39 286 212 220 248 31 457 93 177 263 390 483 506 195 508 180 409 200 355 456 144 192 377 475 125 38 272 411 138
[211] 322 342 5 59 145 432 392 427 463 246 22 416 380 194 17 397 90 1 424 106 151 302 298 459 70 455 406 186 441 343
[241] 320 499 170 398 490 345 373 353 407 473 52 139 254 338 166 198 383 60 359 152 247 489 111 364 79 73 334 67 339 368
[271] 233 493 477 149 13 225 167 7 135 288 389 405 147 358 61 394 55 178 37 259 327 156 310 143 214 18 296 297 306 226
[301] 385 391 58 183 482 215 107 402 495 241 346 85 467 410 113 502 400 362 99 285 274 293 503 23 176 510 231 124 352 122
[331] 172 276 95 512 150 89 173 381 42 312 36 414 196 423 41 94 54 386 66 396 217 34 501 323 28 103 382 96 230 516
[361] 222 110 218 324 268 62 357 87 454 462 102 229 169 174 203 262 401 165 223 65 43 418 114 313 500 271 481 171 524 289
[391] 281 116 491 408 6 253
trainingData <- data[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
trainingData
validationData <- data[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
validationData
Create a multiple regression model to impute missing precipitation data.
pred <- lm(precip ~
week +
tempHi +
tempLo +
avgHumidity +
avgWind +
type +
rain +
thunder +
fog +
snow,
data = trainingData)
summary(pred)
Call:
lm(formula = precip ~ week + tempHi + tempLo + avgHumidity +
avgWind + type + rain + thunder + fog + snow, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-0.47682 -0.07226 -0.00991 0.03411 1.91138
Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.229e-01 1.030e-01 -3.134 0.001876 **
week 5.671e-05 8.166e-04 0.069 0.944674
tempHi -2.127e-03 2.173e-03 -0.979 0.328381
tempLo 1.794e-03 2.427e-03 0.739 0.460359
avgHumidity 4.151e-03 1.130e-03 3.674 0.000278 ***
avgWind 1.125e-02 3.204e-03 3.512 0.000506 ***
typeFog -6.759e-02 7.910e-02 -0.855 0.393389
typeFog,Rain 8.276e-02 7.565e-02 1.094 0.274717
typeFog,Rain,Snow 7.985e-01 9.274e-02 8.611 2.89e-16 ***
typeFog,Rain,Thunderstorm 1.185e+00 2.090e-01 5.669 3.10e-08 ***
typeFog,Snow 5.221e-02 1.089e-01 0.480 0.631808
typeRain 1.360e-01 3.415e-02 3.983 8.35e-05 ***
typeRain,Snow 5.398e-02 9.126e-02 0.592 0.554584
typeRain,Snow,Thunderstorm 1.059e+00 2.073e-01 5.110 5.42e-07 ***
typeRain,Thunderstorm 4.065e-01 6.074e-02 6.693 9.21e-11 ***
typeSnow -1.835e-02 7.452e-02 -0.246 0.805638
typeThunderstorm -2.087e-02 2.068e-01 -0.101 0.919662
rainTRUE NA NA NA NA
thunderTRUE NA NA NA NA
fogTRUE NA NA NA NA
snowTRUE NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2036 on 335 degrees of freedom
Multiple R-squared: 0.5001, Adjusted R-squared: 0.4762
F-statistic: 20.95 on 16 and 335 DF, p-value: < 2.2e-16
Backfit the model to remove statistically insignificant variables.
pred <- lm(precip ~
tempHi +
tempLo +
avgHumidity +
avgWind +
type,
data = trainingData)
summary(pred)
Call:
lm(formula = precip ~ tempHi + tempLo + avgHumidity + avgWind +
type, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-0.47741 -0.07189 -0.00959 0.03295 1.91103
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.321541 0.100983 -3.184 0.001588 **
tempHi -0.002133 0.002168 -0.984 0.325865
tempLo 0.001808 0.002414 0.749 0.454358
avgHumidity 0.004153 0.001128 3.683 0.000269 ***
avgWind 0.011240 0.003195 3.518 0.000495 ***
typeFog -0.067484 0.078963 -0.855 0.393364
typeFog,Rain 0.082454 0.075405 1.093 0.274967
typeFog,Rain,Snow 0.797932 0.092206 8.654 < 2e-16 ***
typeFog,Rain,Thunderstorm 1.184055 0.208341 5.683 2.87e-08 ***
typeFog,Snow 0.051128 0.107575 0.475 0.634900
typeRain 0.135897 0.034045 3.992 8.06e-05 ***
typeRain,Snow 0.053860 0.091105 0.591 0.554792
typeRain,Snow,Thunderstorm 1.060752 0.206095 5.147 4.52e-07 ***
typeRain,Thunderstorm 0.406320 0.060569 6.708 8.36e-11 ***
typeSnow -0.019246 0.073281 -0.263 0.792997
typeThunderstorm -0.020943 0.206471 -0.101 0.919266
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2033 on 336 degrees of freedom
Multiple R-squared: 0.5001, Adjusted R-squared: 0.4778
F-statistic: 22.41 on 15 and 336 DF, p-value: < 2.2e-16
pred <- lm(precip ~
avgHumidity +
avgWind +
type,
data = trainingData)
summary(pred)
Call:
lm(formula = precip ~ avgHumidity + avgWind + type, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-0.48974 -0.07357 -0.01055 0.03831 1.90373
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.392651 0.078609 -4.995 9.45e-07 ***
avgHumidity 0.004402 0.001017 4.329 1.97e-05 ***
avgWind 0.011700 0.003162 3.701 0.000251 ***
typeFog -0.069734 0.078594 -0.887 0.375564
typeFog,Rain 0.084138 0.075090 1.121 0.263294
typeFog,Rain,Snow 0.806531 0.089564 9.005 < 2e-16 ***
typeFog,Rain,Thunderstorm 1.179441 0.206895 5.701 2.60e-08 ***
typeFog,Snow 0.066562 0.103511 0.643 0.520636
typeRain 0.140932 0.033653 4.188 3.60e-05 ***
typeRain,Snow 0.061442 0.088747 0.692 0.489205
typeRain,Snow,Thunderstorm 1.067369 0.204688 5.215 3.22e-07 ***
typeRain,Thunderstorm 0.397205 0.059922 6.629 1.34e-10 ***
typeSnow 0.000282 0.069557 0.004 0.996768
typeThunderstorm -0.048375 0.204459 -0.237 0.813112
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2031 on 338 degrees of freedom
Multiple R-squared: 0.4982, Adjusted R-squared: 0.4789
F-statistic: 25.81 on 13 and 338 DF, p-value: < 2.2e-16
pred <- lm(precip ~
avgHumidity +
avgWind,
data = trainingData)
summary(pred)
Call:
lm(formula = precip ~ avgHumidity + avgWind, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-0.37157 -0.12222 -0.04563 0.04172 1.92679
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.6969737 0.0776456 -8.976 < 2e-16 ***
avgHumidity 0.0085233 0.0008694 9.804 < 2e-16 ***
avgWind 0.0201277 0.0035645 5.647 3.39e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2451 on 349 degrees of freedom
Multiple R-squared: 0.2453, Adjusted R-squared: 0.241
F-statistic: 56.72 on 2 and 349 DF, p-value: < 2.2e-16
Determine the accuracy of the imputation model for precipitation to be 54.26%. While the model does not have great predictive power with an adjusted r-squared value of .241, it has a strong, statistically significant p-value of 2.2e-16 and serves the purpose of imputing missing precipation values.
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$precip)
longer object length is not a multiple of shorter object length
[1] 0.5426136
Impute missing precipitation value.
imputePrecip <- function(r) {
# Given a row containing missing precipitation data, impute the missing value.
# Arguments: row containing missing precip data
# Returns: the predicted precipitation value for the entry
precipPred <- (pred$coefficients[[1]] +
(pred$coefficients[[2]] * r[7]) + (pred$coefficients[[3]] * r[8]))
return(round(max(precipPred, 0), 2))
}
isWeatherEvent <- function(type) {
# Determine if a particular date had a recorded weather event.
# Arguments: string representing the 'type' field of the entry
# Returns: true if the string matches any of the weather event types
# Weather types are hardcoded and this function should be abstracted to include all possible combinations more programmatically
return(
type == "Rain" |
type == "Thunderstorm" |
type == "Fog" |
type == "Snow" |
type == "Fog,Rain" |
type == "Rain,Thunderstorm" |
type == "Fog,Rain,Thunderstorm" |
type == "Fog,Rain,Snow" |
type == "Fog,Snow" |
type == "Rain,Snow" |
type == "Rain,Snow,Thunderstorm"
)
}
# for data points where no weather events were recorded, impute precipitation of 0.00 if precipitation is NA
data[which(!isWeatherEvent(data$type)), ][is.na(data[which(!isWeatherEvent(data$type)), ]$precip), ]$precip <- 0.00
# for remaining data points where a weather event was recorded and precipitation is NA, impute precipitation
incompleteCases <- data[is.na(data$precip), ]
imputedPrecip <- sapply(1:34, function(x) imputePrecip(incompleteCases[x, ])[[1]])
data[is.na(data$precip), ]$precip <- imputedPrecip
# confirm that there are no remaining missing values
sapply(data[, 1:14], function(x) length(x[which(is.na(x) == TRUE)]))
dateTime stepCount day week tempHi tempLo avgHumidity avgWind precip type
0 0 0 0 0 0 0 0 0 0
rain thunder fog snow
0 0 0 0
data
Detect outliers where stepCount is greater than or less than 3 standard deviations away from the mean.
dataWOutliers <-
data # make a copy of the data before removing outliers
dataMean <- mean(data$stepCount) # 10165.07
dataSd <- sd(data$stepCount) # 4223.689
outliers <- data[which((data$stepCount > (dataMean + 3 * dataSd)) |
(data$stepCount < (dataMean - 3 * dataSd))),]
outliers
Remove outliers.
data <- data[-which((data$stepCount > (dataMean + 3 * dataSd)) |
(data$stepCount < (dataMean - 3 * dataSd))), ]
data
After removing outliers, the data shows a bell curve revealing a more normal distribution.
qplot(data$stepCount,
geom="histogram",
binwidth=500)
Establish connection and insert data into mongodb collection.
db <- mongo(collection = "activityWeatherCol", db = "activityWeatherdb", url = "mongodb://localhost")
# clear collection if already populated
if(db$count() > 0) {
db$drop()
}
db$insert(data)
List of 5
$ nInserted : num 522
$ nMatched : num 0
$ nRemoved : num 0
$ nUpserted : num 0
$ writeErrors: list()
Query entire database for all entries. Visually explore data using a time series regression model.
stepsGeneralFrame <- db$find('{}', '{"type":false, "_id":false}')
plot(stepsGeneralFrame$stepCount, xlab="days elapsed", ylab="stepCount")
MULTIPLE REGRESSION MODEL 1: STEPS BY WEEK Compare stepCount against (almost) all other independent variables to determine if there exist any correlations.
According to the multiple regression model there seems to be no strong correlation with the given independent variables with an overall adjusted R-squared value of .05771 and p-value of .001242. The model is statistically significant but has little to no predictive power with an accuracy of 0%.
set.seed(100)
sampleRows <- sample.int(nrow(stepsGeneralFrame), size = nrow(stepsGeneralFrame)*.75)
trainingData <- stepsGeneralFrame[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
validationData <- stepsGeneralFrame[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
pred <- lm(stepCount ~
day +
week +
tempHi +
tempLo +
avgHumidity +
avgWind +
precip +
rain +
thunder +
fog +
snow,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ day + week + tempHi + tempLo + avgHumidity +
avgWind + precip + rain + thunder + fog + snow, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-9337.4 -2325.9 -76.9 2205.9 12033.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13065.794 1781.860 7.333 1.41e-12 ***
dayMonday -1504.756 703.464 -2.139 0.0331 *
daySaturday 341.263 711.292 0.480 0.6317
daySunday -1033.578 701.225 -1.474 0.1413
dayThursday -1543.150 684.775 -2.254 0.0248 *
dayTuesday -1860.627 726.154 -2.562 0.0108 *
dayWednesday -1557.788 702.384 -2.218 0.0272 *
week -55.207 13.786 -4.004 7.50e-05 ***
tempHi 26.328 38.015 0.693 0.4890
tempLo -16.632 42.396 -0.392 0.6951
avgHumidity -25.538 19.172 -1.332 0.1837
avgWind 6.796 55.839 0.122 0.9032
precip 380.843 819.087 0.465 0.6422
rainTRUE 484.321 526.746 0.919 0.3584
thunderTRUE -80.150 899.797 -0.089 0.9291
fogTRUE 974.148 779.644 1.249 0.2123
snowTRUE -813.700 883.877 -0.921 0.3579
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3718 on 374 degrees of freedom
Multiple R-squared: 0.09637, Adjusted R-squared: 0.05771
F-statistic: 2.493 on 16 and 374 DF, p-value: 0.001242
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0
After back-fitting the model by removing all statistically insigificant variables, there still does not appear to be a strong correlation when considering week of the year as the independent variables and step count as the dependent variable. The overall adjusted R-squared value of the model is .03576 and 9.953e-05, indicating an improvement over the previous .05771 and .001242 respectively. The model is more statistically significant but still has little to no predictive power with an accuracy of 0%.
pred <- lm(stepCount ~
week,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ week, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-9891.7 -2562.7 -316.5 2083.1 12109.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11399.01 442.18 25.779 < 2e-16 ***
week -52.28 13.30 -3.932 9.95e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3761 on 389 degrees of freedom
Multiple R-squared: 0.03823, Adjusted R-squared: 0.03576
F-statistic: 15.46 on 1 and 389 DF, p-value: 9.953e-05
plot(pred)
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0
According to the pearson moment and spearman correlation for stepCount and week, there is little to no correlation.
stepsAndWeek <-
cbind(stepsGeneralFrame$stepCount, stepsGeneralFrame$week)
stepsAndWeekPearson <-
cor(stepsAndWeek, use = "pairwise.complete.obs", method = "pearson")
paste("Pearson moment: ", stepsAndWeekPearson[1, 2])
[1] "Pearson moment: -0.215527086406111"
stepsAndWeekSpearman <-
cor(stepsAndWeek, use = "pairwise.complete.obs", method = "spearman")
paste("Spearman correlation: ", stepsAndWeekSpearman[1, 2])
[1] "Spearman correlation: -0.232750501841197"
The model does not fit the data, it has a mean squared error of 14076402.
mse <- function(sm) mean(sm$residuals^2)
mseStepsAndWeek <- mse(pred)
mseStepsAndWeek
[1] 14076402
MULTIPLE REGRESSION MODEL 2: STEPS WHILE ON COOP BY WEEK AND PRECIPITATION
From the multiple regression model of stepCount by week and precipitation there seems to be no strong correlation with an adjusted R-squared value of .09279 and p-value of .0007649. The model is statistically significant but still has little to no predictive power with an accuracy of 0%.
stepsCoopFrame <- db$find('{"week": {"$gte":2, "$lte":26}}', '{"type":false, "_id":false}')
stepsCoopFrame <- stepsCoopFrame[-c(1:8),]
stepsCoopFrame
set.seed(150)
sampleRows <- sample.int(nrow(stepsCoopFrame), size = nrow(stepsCoopFrame)*.75)
trainingData <- stepsCoopFrame[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
validationData <- stepsCoopFrame[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
pred <- lm(stepCount ~
week +
precip,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ week + precip, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-6496.3 -2584.7 -166.4 2188.9 9632.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10330.6 669.5 15.431 < 2e-16 ***
week 118.0 43.0 2.743 0.00696 **
precip -3062.9 1033.5 -2.964 0.00363 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3491 on 127 degrees of freedom
Multiple R-squared: 0.1069, Adjusted R-squared: 0.09279
F-statistic: 7.597 on 2 and 127 DF, p-value: 0.0007649
plot(pred)
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0
According to the pearson moment and spearman correlation for stepCount and precipitation, there is little to no correlation.
stepsAndPrecip <-
cbind(stepsCoopFrame$stepCount, stepsCoopFrame$precip)
stepsAndPrecipPearson <-
cor(stepsAndPrecip, use = "pairwise.complete.obs", method = "pearson")
paste("Pearson moment: ", stepsAndPrecipPearson[1, 2])
[1] "Pearson moment: -0.190223808419684"
stepsAndPrecipSpearman <-
cor(stepsAndPrecip, use = "pairwise.complete.obs", method = "spearman")
paste("Spearman correlation: ", stepsAndPrecipSpearman[1, 2])
[1] "Spearman correlation: -0.140973149152474"
The model does not fit the data, it has a mean squared error of 11907208.
mseStepsAndPrecip <- mse(pred)
mseStepsAndPrecip
[1] 11907208
MULTIPLE REGRESSION MODEL 1: STEPS BY WEEK
Calculate simple moving average to smooth the model and tune the model to find an appropriate smoothing order of 14.
plot(ts(stepsGeneralFrame$stepCount), xlab="days elapsed", ylab="stepCount")
stepCountGenSmooth1 <- SMA(stepsGeneralFrame$stepCount, n=3)
plot(ts(stepCountGenSmooth1), xlab="days elapsed", ylab="stepCount")
stepCountGenSmooth2 <- SMA(stepsGeneralFrame$stepCount, n=8)
plot(ts(stepCountGenSmooth2), xlab="days elapsed", ylab="stepCount")
stepCountGenSmooth3 <- SMA(stepsGeneralFrame$stepCount, n=14)
plot(ts(stepCountGenSmooth3), xlab="days elapsed", ylab="stepCount")
stepCountGenSmooth4 <- SMA(stepsGeneralFrame$stepCount, n=25)
plot(ts(stepCountGenSmooth4), xlab="days elapsed", ylab="stepCount")
Compare the transformed data distributions.
# remove the first 13 rows that were converted to NA
qplot(stepsGeneralFrame$stepCount[14:length(stepCountGenSmooth3)],
geom="histogram",
binwidth=500)
qplot(stepCountGenSmooth3[14:length(stepCountGenSmooth3)],
geom="histogram",
binwidth=500)
Create a new model with the transformed data. While the model is still weak, there is a slight improvement in both predictive power and statistical significance over the previous model with a new adjusted R-squared value of .1082 and p-value of 2.758e-11 compared to 0.03576 and 9.953e-05 respectively. The model is more statistically significant but still has little to no predictive power with an accuracy of 0%.
stepsSmoothedFrame <- stepsGeneralFrame[-c(1:13),]
stepsSmoothedFrame$stepCount <- stepCountGenSmooth3[14:length(stepCountGenSmooth3)]
stepsSmoothedFrame
set.seed(150)
sampleRows <- sample.int(nrow(stepsSmoothedFrame), size = nrow(stepsSmoothedFrame)*.75)
trainingData <- stepsSmoothedFrame[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
validationData <- stepsSmoothedFrame[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
pred <- lm(stepCount ~
week,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ week, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-6359.7 -1239.6 53.4 1017.0 4540.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11375.928 218.386 52.091 < 2e-16 ***
week -45.544 6.636 -6.863 2.76e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1886 on 379 degrees of freedom
Multiple R-squared: 0.1105, Adjusted R-squared: 0.1082
F-statistic: 47.1 on 1 and 379 DF, p-value: 2.758e-11
plot(pred)
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0
After transforming the data, the pearson moment and spearman correlation reveal that the model of stepCount by week still has a weak correlation but has improved over the previous model before smoothing. The new pearson moment is now -0.341562287996567 up from -0.190223808419684 and the new spearman correlation is now -0.432126201947292 up from -0.140973149152474.
stepsAndWeek <-
cbind(stepsSmoothedFrame$stepCount, stepsSmoothedFrame$week)
stepsAndWeekPearson2 <-
cor(stepsAndWeek, use = "pairwise.complete.obs", method = "pearson")
paste("Pearson moment: ", stepsAndWeekPearson2[1, 2])
[1] "Pearson moment: -0.341562287996567"
stepsAndWeekSpearman2 <-
cor(stepsAndWeek, use = "pairwise.complete.obs", method = "spearman")
paste("Spearman correlation: ", stepsAndWeekSpearman2[1, 2])
[1] "Spearman correlation: -0.432126201947292"
The model still does not fit the data, however, its mean squared error of 3539576 is an improvement over the pre-smoothing mse of 11907208.
mseStepsAndWeek2 <- mse(pred)
mseStepsAndWeek2
[1] 3539576
Calculate simple moving average to smooth the model and tune the model to find an appropriate smoothing order of 22.
plot(ts(stepsCoopFrame$stepCount), xlab="days elapsed", ylab="stepCount")
stepCountCoopSmooth1 <- SMA(stepsCoopFrame$stepCount, n=14)
plot(ts(stepCountCoopSmooth1), xlab="days elapsed", ylab="stepCount")
stepCountCoopSmooth2 <- SMA(stepsCoopFrame$stepCount, n=22)
plot(ts(stepCountCoopSmooth2), xlab="days elapsed", ylab="stepCount")
stepCountCoopSmooth3 <- SMA(stepsCoopFrame$stepCount, n=30)
plot(ts(stepCountCoopSmooth3), xlab="days elapsed", ylab="stepCount")
Create a new model with the transformed data. While the model is still weak, there is a slight improvement in both predictive power and statistical significance over the previous model with a new adjusted R-squared value of 0.7719 and p-value of 2.2e-16 compared to .09279 and .0007649 respectively. The model is much more statistically significant but still has little to no predictive power with an accuracy of 0%.
# remove the first 22 values which were converted to NA
length(which(is.na(stepCountCoopSmooth2) == TRUE))
[1] 21
stepsCoopSmoothedFrame <- stepsCoopFrame[-c(1:21),]
stepsCoopSmoothedFrame$stepCount <- stepCountCoopSmooth2[22:length(stepCountCoopSmooth2)]
stepsCoopSmoothedFrame
set.seed(150)
sampleRows <- sample.int(nrow(stepsCoopSmoothedFrame), size = nrow(stepsCoopSmoothedFrame)*.75)
trainingData <- stepsCoopSmoothedFrame[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
validationData <- stepsCoopSmoothedFrame[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
pred <- lm(stepCount ~
week +
precip,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ week + precip, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-1423.9 -472.7 38.8 434.1 1214.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9031.310 148.436 60.843 <2e-16 ***
week 165.998 8.468 19.604 <2e-16 ***
precip -109.851 158.344 -0.694 0.489
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 566.1 on 111 degrees of freedom
Multiple R-squared: 0.7759, Adjusted R-squared: 0.7719
F-statistic: 192.2 on 2 and 111 DF, p-value: < 2.2e-16
plot(pred)
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0